feat: support Spark expression slice by andygrove · Pull Request #4149 · apache/datafusion-comet

andygrove · 2026-04-29T23:25:10Z

Which issue does this PR close?

Closes #.

Rationale for this change

Add native support for Spark's slice(array, start, length) expression so it runs on Comet instead of falling back to Spark.

The datafusion-spark crate already ships a SparkSlice, but it is not Spark-compatible: when a negative start lies before the beginning of the array (e.g. slice([a], -2, 2)), it returns the first element instead of an empty array. We can upstream the fix later; for now this PR ships a Comet-local implementation.

What changes are included in this PR?

native/spark-expr/src/array_funcs/array_slice.rs: new SparkArraySlice UDF (spark_array_slice) implementing Spark's slice semantics, including 1-based indexing, negative-start-from-end, error on start = 0 or length < 0, and clamping length to the array end. Supports both List and LargeList element storage.
native/spark-expr/src/comet_scalar_funcs.rs: register the new UDF.
spark/src/main/scala/org/apache/comet/serde/arrays.scala: CometSlice serde casts the start/length args to Long and serialises a call to spark_array_slice, promising containsNull = true to match DataFusion's list nullability.
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: register Slice in arrayExpressions.

How are these changes tested?

12 native unit tests in array_slice.rs covering positive / negative / zero / overflowing start, length 0, length past end, null inputs, empty arrays, and the error cases.
New SQL test file spark/src/test/resources/sql-tests/expressions/array/slice.sql covering all-literal, column + literal, and column-only argument combinations across boolean, tinyint, smallint, int, bigint, float, double, decimal, date, timestamp, timestamp_ntz, string, and nested array element types, plus the negative-start-overflow case that exposed the upstream bug.

kazuyukitanimura

pending on the conflict resolution

comphead · 2026-05-20T19:04:12Z

Filed apache/datafusion#22400 to fix edge case in upstream

# Conflicts: # native/spark-expr/src/comet_scalar_funcs.rs

andygrove added 2 commits April 29, 2026 17:24

feat: support Spark expression slice

1f6133a

refactor: tighten slice_list inner loop

4a46ed4

andygrove marked this pull request as ready for review April 30, 2026 01:03

andygrove added this to Comet Development May 13, 2026

github-project-automation Bot moved this to Todo in Comet Development May 13, 2026

andygrove moved this from Todo to In progress in Comet Development May 13, 2026

andygrove added this to the 0.17.0 (June 2026) milestone May 13, 2026

kazuyukitanimura approved these changes May 20, 2026

View reviewed changes

andygrove mentioned this pull request May 20, 2026

chore: wire shiftrightunsigned #4375

Open

Merge remote-tracking branch 'apache/main' into feat/array-slice

e1be981

# Conflicts: # native/spark-expr/src/comet_scalar_funcs.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support Spark expression slice#4149

feat: support Spark expression slice#4149
andygrove wants to merge 3 commits into
apache:mainfrom
andygrove:feat/array-slice

andygrove commented Apr 29, 2026

Uh oh!

kazuyukitanimura left a comment

Uh oh!

comphead commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andygrove commented Apr 29, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

comphead commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants